[SPARK-27811][Core][Docs]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead.#24671
[SPARK-27811][Core][Docs]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead.#24671beliefer wants to merge 12 commits intoapache:masterfrom beliefer:improve-docs-of-overhead
Conversation
|
Test build #105660 has finished for PR 24671 at commit
|
There was a problem hiding this comment.
I think the proposed new description is inaccurate because interned strings and native overheads (e.g. Netty direct buffers) aren't allocated outside the executor process, per se (since off-heap memory is still in the JVM's own process address space).
Rather, I think the distinction is that these non-heap allocations don't count towards the JVM's total heap size limit, a factor which needs to be taken into account for container memory sizing: if you request a Spark executor with, say, a 4 gigabyte heap, then the actual peak memory usage of the process (from the OS's perspective) is going to be more than 4 gigabytes due to these types of off-heap allocations.
If we want to avoid container OOM kills (by YARN or Kubernetes) then we need to account for this extra overhead somewhere, hence these *memoryOverhead "fudge-factors": setting the memory overhead causes us to request a container whose total memory size is greater than the heap size.
That said, increasing the memory overhead does also result in additional memory headroom that can be used by non-driver/executor processes (like overhead from other processes in the container). Maybe we could state this explicitly, e.g. something like "... non-heap memory, including off-heap memory (e.g ....) and memory used by other non-driver / executor processes running in the same container".
You're correct that the relationship between these configurations and the Tungsten off-heap configuration is a bit confusing. To address that confusion, I'd prefer to expand the documentation to explicitly mention these *memoryOverhead configurations in the documentation for the Tungsten off-heap setting: I think that doc should recommend raising the memoryOverhead when setting the Tungsten config.
It might also help to more explicitly clarify that these settings only make sense in a containerized / resource limited deployment mode (e.g. not in standalone mode).
In summary, I think there's definitely room for confusion with the existing fix, but I think the right solution is an expansion of the docs to much more explicitly clarify the relationship between both sets of configurations, not a minor re-word.
|
Test build #105674 has finished for PR 24671 at commit
|
Thanks for your review. Yes, I ignored the detail off-heap memory is still in the JVM's own process address space. |
|
Test build #105676 has finished for PR 24671 at commit
|
|
Test build #105677 has finished for PR 24671 at commit
|
|
Test build #105679 has finished for PR 24671 at commit
|
|
Retest this please. |
|
Test build #105682 has finished for PR 24671 at commit
|
|
Test build #105686 has finished for PR 24671 at commit
|
|
Retest this please. |
|
Test build #105710 has finished for PR 24671 at commit
|
core/src/main/scala/org/apache/spark/internal/config/package.scala
Outdated
Show resolved
Hide resolved
|
Retest this please. |
|
Test build #105812 has finished for PR 24671 at commit
|
|
Test build #105814 has finished for PR 24671 at commit
|
|
Test build #105854 has finished for PR 24671 at commit
|
|
Test build #105886 has finished for PR 24671 at commit
|
|
Test build #105939 has finished for PR 24671 at commit
|
|
Test build #105950 has finished for PR 24671 at commit
|
|
@JoshRosen Could you review this PR again and find other issues? |
|
Merged to master |
|
@srowen Thanks for your merger. I thought that two requested change need every one to agree. |
What changes were proposed in this pull request?
I found the docs of
spark.driver.memoryOverheadandspark.executor.memoryOverheadexists a little ambiguity.For example, the origin docs of
spark.driver.memoryOverheadstart withThe amount of off-heap memory to be allocated per driver in cluster mode.But
MemoryManageralso managed a memory area named off-heap used to allocate memory in tungsten mode.So I think the description of
spark.driver.memoryOverheadalways make confused.spark.executor.memoryOverheadhas the same confused withspark.driver.memoryOverhead.How was this patch tested?
Exists UT.